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Abstract. In 2007, we applied MCMC to approximately calculate the ratio of essential 
graphs (EGs) to directed acyclic graphs (DAGs) for up to 20 nodes. In the present paper, 
we extend our previous work from 20 to 31 nodes. We also extend our previous work by 
computing the approximate ratio of connected EGs to connected DAGs, of connected EGs 
to EGs, and of connected DAGs to DAGs. Furthermore, we prove that the latter ratio is 
asymptotically 1. We also discuss the implications of these results for learning DAGs from 
data. 



1. Introduction 

Probably the most common approach to learning directed acyclic graph (DAG) models^ 
from data, also known as Bayesian network models, is that of performing a search in the space 
of either DAGs or DAG models. In the latter case, DAG models are typically represented as 
essential graphs (EGs). Knowing the ratio of EGs to DAGs for a given number of nodes is a 
valuable piece of information when deciding which space to search. For instance, if the ratio is 
low, then one may prefer to search the space of EGs rather than the space of DAGs, though the 
latter is usually considered easier to traverse. Unfortunately, while the number of DAGs can 
be computed without enumerating them all ( Robinson . 19771 Equatio n 8), the only method 
for co unting EGs that we are aware of is enumeration. Specifically, Gillispie and Permian! 

enumerated all the EGs for up to 10 nodes by means of a computer program. They 
showed that the ratio is around 0.27 for 7-10 nodes. They also conjectured a similar ratio 
for more than 10 nodes by extrapolating the exact rat i os for up to 10 nodes. Later, the 
asymptotic ratio was proven to be around 0.26 (iGarridol . l2009f ). 

Enumerating EGs for more than 10 n odes seems challenging : To enumerate all the EGs 
over 10 nodes, the computer program of iGillispie and Perlmanl ( 20021 ) needed 2253 hours in 



a "mid-1990s-era, midrange minicomputer". We obviously prefer to know the exact ratio of 
EGs to DAGs for a given number of nodes rather than an approximation to it. However, an 
approximate ratio may be easier to obtain and serve as well as the exact one to decide which 
space to search. In 2007, we proposed a Markov chain Monte Carlo (MCMC) ap proach 
to approximately calculate the ratio while avoiding enumerating EGs ( Pena . 20071 ) . Our 
proposal consisted of the following steps. First, we constructed a Markov chain (MC) whose 
stationary distribution was uniform over the space of EGs for the given number of nodes. 
Then, we sampled that stationary distribution and computed the ratio R of essential DAGs 
(EDAGs) to EGs in the sample. Finally, we transformed this approximate ratio into the 
desired approximate ratio of EGs to DAGs as follows: Since *^q s can be expressed as 

**nfnf ^tn^. B th en we can approx imate it by tfn/ff 7? where #DAGs and #EDAGs can 



be computed via bv iRobinsonl ( 119771 . Equation 8) and ISteinskvl ( 120031 . p. 270), respectively. 
We reported the so-obtained approximate ratio for up to 20 nodes. The approximate ratios 
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: A11 the graphs considered in this paper are labeled graphs. 

2 We use the symbol # followed by a class of graphs to denote the cardinality of the class. 
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agreed well with the exact ones available in the literature and suggested that the exact ratios 
are not very low (the approximate ratios were 0.26-0.27 for 7-20 nodes). This indicates that 
one should not expect more than a moderate gain in efficiency when searching the space 
of EGs instead of the space of DAGs. Of course, this is a bit of a bold claim since the 
gain is dictated by the average ratio over the EGs visited during the search and not by the 
average ratio over all the EGs in the search space. For instance, the gain is not the same 
if we visit the empty EG, whose ratio is 1, or the complete EG, whose ratio is 1/n! for n 
nodes. Unfortunately, it is impossible to know beforehand which EGs will be visited during 
the search. Therefore, the best we can do is to draw (bold) conclusions based on the average 
ratio over all the EGs in the search space. 

In this paper, we extend our previous work from 20 to 31 nodes. We also extend our 
previous work by reporting approximate ratios that we did not report before. Specifically, 
we report the approximate ratio of connected EGs (CEGs) to connected DAGs (CDAGs), 
of CEGs to EGs, and of CDAGs to DAGs. We elaborate later on why these ratios are of 
interest. The approximate ratio of CEGs to CDAGs is computed from the sample as follows. 
First, we compute the ratio R' of EDAGs to CEGs in the sample. Second, we transform this 
approximate ratio into the desired approximate ratio of CEGs to CDAGs as follows: Since 
^cdags can ^ e ex P resse d as jj!+ffrL ; th en we can approximate it by ^cdagI W wnere 



#EDAGs can be computed by ISteinskvl (120031 . p. 270) and j^CDAGs can be computed as 



shown in Appendix A. The approximate ratio of CEGs to EGs is computed directly from the 
sampl e. The approxim ate ratio of CDAGs to DAGs is computed with the help of Appendix 
A and Robinso 3 (Il977l . Equation 8). 



The computer program implementing the MCMC approach described above is essentially 
the same as in our previous work (it has only been modified to report whether the EGs 
sampled are connected or not)0 The program is written in C++ and compiled in Microsoft 
Visual C++ 2010 Express. The experiments are run on an AMD Athlon 64 X2 Dual Core 
Processor 5000+ 2.6 GHz, 4 GB RAM and Windows Vista Business. The compiler and the 
computer used in our previous work were Microsoft Visual C++ 2008 Express and a Pentium 
2.4 GHz, 512 MB RAM and Windows 2000. The experimental settings is the same as before 
for up to 30 nodes, i.e. each approximate ratio reported is based on a sample of 10 4 EGs, 
each obtained as the state of the MC after performing 10 6 transitions with the empty EG as 
initial state. For 31 nodes though, each EG sampled is obtained as the state of the MC after 
performing 2 x 10 6 transitions with the empty EG as initial state. We elaborate later on why 
we double the length of the MCs for 31 nodes. 

The rest of the paper is organized as follows. In Section [21 we extend our previous work 
from 20 to 31 nodes. In Section [21 we extend our previous work with new approximate 
ratios. In Section HI we recall our findings and discuss future work. The paper ends with two 
appendices devoted to technical details. 

2. Extension of Previous Work from 20 to 31 Nodes 

Table [1] presents our new approximate ratios, together with our old ones and the exact ones 
available in the literature. The first conclusion that we draw from the table is that the new 
ratios are very close to the exact ones, as well as to our old ones. This makes us confident on 
the accuracy of the ratios for 11-31 nodes, where no exact ratios are available in the literature 
due to the high computational cost involved in calculating them. Another conclusion that we 
draw from the table is that the ratios seem to be 0.26-0.28 for 11-31 n odes. This agrees well 



with t he conjectured ratio of 0.27 for more than 10 nodes reported bvlGillispie and Perlman 
( 20021 ) and with the asymptotic ratio of 0.26 reported by Garrido ( 20091 ). A last conclusion 



that we draw from the table is that the fraction of EGs that represent a unique DAG, i.e. 



The original and the modified programs are available at www.ida.liu.se/~jospe. 



Table 1. Exact and approximate *^q s and *#egT ■ 



NODES EXACT OLD APPROXIMATE NEW APPROXIMATE 





# EGs 


#EDAGs 


Hours 


#EGs 


#EDAGs 


Hours 


# EGs 


#EDAGs 


Hours 




#DAGs 


#EGs 


#DAGs 


#EGs 


#DAGs 


#EGs 


2 


0.66667 


0.50000 


0.0 


0.66007 


0.50500 


3.5 


0.67654 


0.49270 


1.3 


3 


0.44000 


0.36364 


0.0 


0.43704 


0.36610 


5.2 


0.44705 


0.35790 


1.0 


4 


0.34070 


0.31892 


0.0 


0.33913 


0.32040 


6.8 


0.33671 


0.32270 


1.2 


5 


0.29992 


0.29788 


0.0 


0.30132 


0.29650 


8.0 


0.29544 


0.30240 


1.4 


6 


0.28238 


0.28667 


0.0 


0.28118 


0.28790 


9.4 


0.28206 


0.28700 


1.6 


7 


0.27443 


0.28068 


0.0 


0.27228 


0.28290 


12.4 


0.27777 


0.27730 


2.0 


8 


0.27068 


0.27754 


0.0 


0.26984 


0.27840 


13.8 


0.26677 


0.28160 


2.3 


9 


0.26888 


0.27590 


7.0 


0.27124 


0.27350 


16.5 


0.27124 


0.27350 


2.6 


10 


0.26799 


0.27507 


2253.0 


0.26690 


0.27620 


18.8 


0.26412 


0.27910 


3.1 


11 




0.26179 


0.28070 


20.4 


0.26179 


0.28070 


3.8 


12 




0.26737 


0.27440 


21.9 


0.26825 


0.27350 


4.2 


13 




0.26098 


0.28090 


23.3 


0.27405 


0.26750 


4.5 


14 




0.26560 


0.27590 


25.3 


0.27161 


0.26980 


5.1 


15 




0.27125 


0.27010 


25.6 


0.26250 


0.27910 


5.7 


16 




0.25777 


0.28420 


27.3 


0.26943 


0.27190 


6.7 


17 




0.26667 


0.27470 


29.9 


0.26942 


0.27190 


7.6 


18 




0.25893 


0.28290 


37.4 


0.27040 


0.27090 


8.2 


19 




0.26901 


0.27230 


38.1 


0.27130 


0.27000 


9.0 


20 




0.27120 


0.27010 


40.3 


0.26734 


0.27400 


9.9 


21 






0.26463 


0.27680 


17.4 


22 






0.27652 


0.26490 


18.8 


23 






0.26569 


0.27570 


13.3 


24 






0.27030 


0.27100 


14.0 


25 






0.26637 


0.27500 


15.9 


26 






0.26724 


0.27410 


17.0 


27 






0.26950 


0.27180 


18.6 


28 






0.27383 


0.26750 


20.1 


29 






0.27757 


0.26390 


21.1 


30 






0.28012 


0.26150 


21.6 


31 






0.27424 


0.26710 


47.3 


oo 


« 0.26 


w 0.28 









is 0.26-0.28 for 11-31 nodes, a substantial fraction. This agrees with the asymptotic 



#EDAG 
#EGs 

value of 0.28 that we calculate as follows: Since can be expressed as * EL > M "# L) u - 

i 



#EGs 
1 



#nAa* #EGs 



then w e ca n approximate its asymptotic value by j^q^q thanks to the results by iGarrido 
fl2009f ) and lSteinskv! (12004 ). 

Recall from the previous section that we slightly modified the experimental setting for 31 
nodes, namely we doubled the length of the MCs. The reason is as follows. We observed 
an increasing trend in *^q s for 25-30 nodes, and interpreted this as an indication that we 
might be reaching the limits of our experimental setting. Therefore, we decided to double 
the length of the MCs for 31 nodes in order to see whether this broke the trend. As can be 
seen in Table [H it did. This suggests that approximating the ratio for more than 31 nodes 
will require larger MCs and/or samples than the ones used in this work. 

Note that we can approximate the number of EGs for up to 31 nodes as -gj^ -ftDAGs, 

where ^ags comes from Table [Hand #DAGs can be computed by Robinson ( 19771 . Equation 
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TABLE 2. Approximate -ggg;, f|g and f^f. 



NODES NEW APPROXIMATE 





#CEGs #CEGs #CDAGs 


#CDAGs #EGs #DAGs 


2 


0.51482 0.50730 0.66667 


3 


0.39334 0.63350 0.72000 


4 


0.32295 0.78780 0.82136 


5 


0.29471 0.90040 0.90263 


6 


0.28033 0.94530 0.95115 


7 


0.27799 0.97680 0.97605 


8 


0.26688 0.98860 0.98821 


9 


0.27164 0.99560 0.99415 


10 


0.26413 0.99710 0.99708 


11 


0.26170 0.99820 0.99854 


12 


0.26829 0.99940 0.99927 


13 


0.27407 0.99970 0.99964 


14 


0.27163 0.99990 0.99982 


15 


0.26253 1.00000 0.99991 


16 


0.26941 0.99990 0.99995 


17 


0.26942 1.00000 0.99998 


18 


0.27041 1.00000 0.99999 


19 


0.27130 1.00000 0.99999 


20 


0.26734 1.00000 1.00000 


21 


0.26463 1.00000 1.00000 


22 


0.27652 1.00000 1.00000 


23 


0.26569 1.00000 1.00000 


24 


0.27030 1.00000 1.00000 


25 


0.26637 1.00000 1.00000 


26 


0.26724 1.00000 1.00000 


27 


0.26950 1.00000 1.00000 


28 


0.27383 1.00000 1.00000 


29 


0.27757 1.00000 1.00000 


30 


0.28012 1.00000 1.00000 


31 


0.27424 1.00000 1.00000 


oo 


? ? «l 



8). Alternatively, we can approximate it a s J^^^^ EDAGs, where comes from 

Tableland #EDAGs can be computed bv lSteinskvl fl2003l . p. 270). 

Finally, a few words on the running times reported in Tabled] may be in place. First, note 
that t he times reported in Table[T]for the exact ratios are borrowed from lGillispie and Perlman 
( 20021 ) and, thus, they correspond to a computer program run on a "mid-1990s-era, midrange 
minicomputer" . Therefore, a direct comparison to our times seems unadvisable. Second, our 
new times are around four times faster than our old times. The reason may be in the use of 
a more powerful computer and/or a different version of the compiler. The reason cannot be 
in the difference in the computer programs run, since this is negligible. Third, our new times 
have some oddities, e.g. the time for two nodes is greater than the time for three nodes. The 
reason may be that the computer ran other programs while running the experiments reported 
in this paper. 
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3. Extension of Previous Work with New Ratios 



Gillispie and Perlmanl ( 120021 . p. 153) state that "the variables chosen for inclusion in a 
multivariate data set are not chosen at random but rather because they occur in a common 
real- world context, and hence are likely to be correlated to some degree" . This implies that the 
EG learnt from some given data is likely to be connected. We agree with this observation, be- 
cause we believe that humans are good at detecting sets of mutually uncorrelated variables so 
that the original learning problem can be divided into smaller independent learning problems, 
each of which results in a CEG. Therefore, although we still cannot say which EGs will be vis- 
ited during the search, we can say that some of them will most likely be connected and some 
others disconnected. This raises the question of whether #cdag« « Jnrfirj, where DEGs and 



^nnAG« 



DDAGs stand for disconnected EGs and disconnected DAGs. iGillispie and Perlmanl (12002 
p. 154) go on saying that a consequence of the learnt EG being connected is "that a sub- 
stantial number of undirected edges are likely to be present in the representative essential 
graph, which in turn makes it likely that the corresponding equivalence class size will be 
relatively large" . In other words, they conjecture that the equivalence classes represented by 
CEGs are relatively large. We interpret the term "relatively large" as having a ratio smaller 
than #dags - However, this conjecture does not seem to hold according to the approximate 
ratios presented in Table |2j There, we can see that ^cdags ~ 0.26-0.28 for 6-31 nodes and, 
thus, ^cdags ~ #dags - That the two ratios coincide is not by chance because *^ Gs s ~ 
0.95-1 for 6-31 nodes, as can be seen in the table. A problem of this ratio being so close 
to 1 is that sampling a DEG is so unlikely that we cannot answer the question of whether 
^cdags ~ #ddags with our sampling scheme. Therefore, we have to content with having 
learnt that *^^q s ~ So AGs ' ^ * s wor th mentioning that this result is somehow conjectured 



by Kocka when he states in a personal communication to iGillispid ( 120061 . p. 1411) that "large 
equivalence classes are merely composed of independent classes of smaller sizes that combine 
to make a single larger class". Again, we interpret the term "large" as having a ratio smaller 
than Sdags - Again, we cannot check Kocka's conjecture because sampling a DEG is very 
unlikely. However, we believe that the conjecture holds, because we expect the ratios for 
those EGs with k connected components to be aroun d 0.27 fc , i . e. we expect the ratios of the 



components to be almost independent one of another. IGillispid (120061 p. 1411) goes on saying 
that "an equivalence class encountered at any single step of the iterative [learning] process, 
a step which may involve altering only a small number of edges (typically only one), might 
be quite small" . Note that the equivalence classes that he suggests that are quite small must 
correspond to CEGs, because he suggested before that large equivalence classes correspond 
to DEGs. We interpret the term "quite small" as having a ratio greater than t§^i - Again, 
this conjecture does not seem to hold according to the approximate ratios presented in Table 
[2 There, we can see that ^dags ~ 0.26-0.28 for 6-31 nodes and, thus, ffoAGs ~ *dag 8 - 

Recall from Table [1] that we know the asymptotic values for ^^q s and . Note 

also from the table that the asymptotic values are almost achieved for 6-7 nodes already. It 
would also be nice to know the asymptotic values for #cdags and ^^Gs ■ From the results 
in Table [2J they should be around 0.27 and 1, respectively. We are working on a formal 
proof. In the meantime, we have proven a related result, namely that the ratio of CDAGs to 
DAGs is asymptotically 1. The proof can be found in Appendix B. Note from Table [2] that 
the asymptotic value is almost achieved for 6-7 nodes already. Our result adds to the list of 
similar results in th e literature, e.g. the ratio of labeled connected graphs to labeled graphs 



is asymptotically 1 (lHararv and Palmerl . I1973L p. 205). 

Note that we can approximate the number of CEGs for up to 31 nodes as ^gg^#£Gs, 

where ^^q s s comes from Table [2] and #EGs can be computed as shown in the previous 
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section. Alternatively, we can approximate it as # CDA q s #CDAGs, where # CDA q s comes 
from Table |2] and j^CDAGs can be computed as shown in Appendix A. 

Finally, note that the running times to obtain the results in Table |2] are the same as those 
in Table [TJ because both tables are based on the same samples. 



4. Discussion 

Gillispie and Permian! (l2002h showed that Jgg- « 0.27 for 7-10 nodes. iGarridol (l2009f ) 



#DAGs 

showed that asymptotically ^ A q s ~ 1. We have shown in this paper that *^ A q s ~ 0.26-0.28 
for 11-31 nodes. These results indicate that one should not expect more than a moderate 
gain in efficiency when searching the space of EGs instead of the space of DAGs. We have 
also shown that ^dags ~ 0.26-0.28 for 6-31 nodes and, thus, ^cdags ~ #dags - Therefore, 
when searching the space of EGs, the fact that some of the EGs visited will most likely 
be connected does not seem to imply any additional gain in efficiency beyond that due to 
searching the space of EGs instead of the space of DAGs. 

Some questions that remain open and that we would like to address in the future are 
checking whether *cdags ~ #ddags > an< ^ computing the asymptotic ratios of CEGs to 
CDAGs, and of CEGs to EGs. Recall that in this paper we have proven that the asymptotic 
ratio of CDAGs to DAG is 1. Another topic for further research, already mentioned in 
our previous work, would be improving the graphical modifications that determine the MC 
transitions, because they rather often produce a graph that is not an EG. Specifically, the MC 
transitions are determined by choosing uniformly one out of seven modifications to perform 
on the current EG. Actually, one of the modifications leaves the current EG unchanged. 
Therefore, around 14 % of the modifications cannot change the current EG and, thus, 86 % 
of the modifications can change the current EG. In our experiments, however, only 6-8 % of 
the modifications change the current EG. The rest up to the mentioned 86 % produce a graph 
that is not an EG and, th us, they leave t he current EG unchanged. This problem has been 



previously pointed out by iPerlmanl ( 120001 ) . Furthermore, he presents a set of more complex 
modifications that are claimed to alleviate the problem just descr i bed. Unfortunately, no 
evidence supporting this claim is provided. More recently, He et al. ( 20121 ) have proposed an 



alternative set of modifications having a series of desirable features that ensure that applying 
the modifications t o an EG results in a different EG. Although these modifications are more 



complex than ours. lHe et al.l (120 12l pp. 19-20) claim that their MCMC approach is thousands 
of times faster than our previous work for 3, 4 and 6 nodes. However, it seems unfair to us to 
compare these two approaches: Whereas we run 10 4 MCs of 10 6 transitions each to obtain a 
sample, they only run one MC of 10 4 -10 5 transitions. They seemed to have missed this point 
in their comparison. Therefore, it is not clear to us how their MCMC approach scales to 10-30 
nodes as compared to ours. The point of developing modifications that are more effective 
than ours at producing EGs is to make a better use of the running time by minimizing the 
number of graphs that have to be discarded. However, this improvement in effectiveness 
has to be weighed against the computational cost of the modifications, so that the MCMC 
approach still scales to the number of nodes of interest. 



Appendix A: Counting CDAGs 
Let A(x) denote the exponential generating function for DAGs. That is, 



k=l 



k h 
X 
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where A k denotes the number of DAGs of order k. Likewise, let a(x) denote the exponential 
generating function for CDAGs. That is, 



x K 



fc=l 



where a k denotes the number of C DAGs of order k. Note that A k can be computed without 
having to resort to enumeration by iRobinson (Il977l . Equation 8). However, we do not know 
of any formula to compute a k without enumeration. Luckily, a k can be computed from A k as 
follows. First, note that 

1 + A(x) = e a{x) 

as shown by Hararv and Palmer f 19731 . pp. 8-9). Now, let us define A = I and redefine A(x) 
as 



A 



k=0 



k h 



i.e. the summation starts with k = 0. Then, 



A{x) 



a(x) 



Consequently, 



A, 



^ k k\ {n-k)\ ),n 
k=i v ; 



as shown by lHarary and Palmer! (119731 . pp. 8-9), and thus 

n-1 

a r . 



A n - ^ ( u ) a kA n -k)/n. 



See also ICasteld (120021 . pp. 38-39). 



Appendix B: Asymptotic Behavior of CDAGs 

Theorem 1. The ratio of CDAGs of order n to DAGs of order n tends to 1 as n tends to 
infinity. 

Proof. Let A n and a n denote the numbers of DAGs and CDA Gs of order n , respectively. 
Specifically, we prove that (A n /n\)/(a n /n\) — > 1 as n — > 00. By IWrightl ( 11967I . Theorem 6), 
this holds if the following three conditions are met: 

(i) log((4/n!)/(i n _!/(n - 1)!)) -»■ 00 as n -> 00, 

(ii) \og((A n+1 /(n + l)\)/(A n /n\)) > log((A n /n!)/(A„_ 1 /(n - 1)!)) for all large enough n, 
and 

(iii) EZi( A k/kl) 2 /(A 2k /(2k)\) converges. 

We start by proving that the condition (i) is met. Note that from every DAG G over the 
nodes {v%, . . . ,f n _i} we can construct 2 n_1 different DAGs H over {v\, . . . ,v n } as follows: 
Copy all the arrows from G to H and make v n a child in H of each of the 2 n ~ 1 subsets of 
{v 1, . . . , f n _i}. Therefore, 

log((A n /n\)/(A n ^/(n - 1)!)) > log(2 n -Vn) 

which clearly tends to infinity as n tends to infinity. 

We continue by proving that the condition (ii) is met. Every DAG over the nodes V U {w} 
can be constructed from a DAG G over V by adding the node w to G and making it a child 
of a subset Pa of V. If a DAG can be so constructed from several DAGs, we simply consider 
it as constructed from one of them. Let Hi, ... , H m represent all the DAGs so constructed 
from G. Moreover, let Pa^ denote the subset of V used to construct Hi from G. From each 
Pdi, we can now construct 2m DAGs over V U {w, u) as follows: (i) Add the node u to Hi and 
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make it a child of each subset Paj U {w} with 1 < j < m, and (ii) add the node u to Hi and 
make it a parent of each subset Paj U {w} with 1 < j < m. Therefore, A n+ i/A n > 2A n j A n _\ 
and thus 

log((A n+1 /(n + l)!)/(A,/n!)) = log(A n+1 /A n ) - log(n + 1) > log(2A n /A n _ 1 ) - log(n + 1) 

> log(2A n /A n _i) - log(2n) = \og{A n /A n ^) - logn = log((A n /n!)/(A n _ 1 /(n - 1)!)). 

Finally, we prove that the condition (iii) is met. Let G and G' denote two (not necessarily 
distinct) DAGs of order k. Let V = {v i, . . . , v^} and V = {v[, . . . , u£} denote the nodes in G 
and G", respectively. Consider the DAG H over KUK' that has the union of the arrows in G 
and G'. Let w and w' denote two nodes in V and V, respectively. Let S be a subset of size 
k — 1 of V U V \ {w, w'}. Now, make w a parent in H of all the nodes in S fl V, and make 
w' a child in H of all the nodes in S fl V. Note that the resulting if is a DAG of order 2k. 
Note that there are k 2 different pairs of nodes w and w'. Note that there are ( 2 fc ~f) different 
subsets of size k — 1 of V U V \ {w, u/}. Note that every choice of DAGs G and G', nodes 
w and w', and subset S gives rise to a different DAG H. Therefore, A 2k /A 2 k > k 2 ( 2 ^ 2 ) and 
thus 

CO oo oo 

J2(A k /k\) 2 /(A 2k /(2k)\) = Y,A 2 k (2k)\/(A 2k k\ 2 ) < Y,((k-mk-l)\(2ky.)/(k 2 (2k-2)\k\ 2 ) 

k=l k=l k=l 

oo 

= £(4* - 2)/P 
fc=i 

which clearly converges. □ 
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